Parallélisme des nids de boucles pour l'optimisation du temps d'exécution et de la taille du code. (Nested loop parallelism to optimize execution time and code size)

نویسنده

  • Yaroub Elloumi
چکیده

The real time implementation algorithms always include nested loops which require important execution times. Thus, several nested loop parallelism techniques have been proposed with the aim of decreasing their execution times. These techniques can be classified in terms of granularity, which are the iteration level parallelism and the instruction level parallelism. In the case of the instruction level parallelism, the techniques aim to achieve a full parallelism. However, the loop carried dependencies implies shifting instructions in both side of nested loops. Consequently, these techniques provide implementations with nonoptimal execution times and important code sizes, which represent limiting factors when implemented on embedded real-time systems. In this work, we are interested on enhancing the parallelism strategies of nested loops. The first contribution consists of purposing a novel instruction level parallelism technique, called “delayed multidimensional retiming”. It aims to scheduling the nested loops with the minimal cycle period, without achieving a full parallelism. The second contribution consists of employing the “delayed multidimensional retiming” when providing nested loop implementations on real time embedded systems. The aim is to respect an execution time constraint while using minimal code size. In this context, we proposed a first approach that selects the minimal instruction parallelism level allowing the execution time constraint respect. The second approach employs both instruction level parallelism and iteration level parallelism, by using the “delayed multidimensional retiming” and the “loop striping”.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Promises of Hybrid Hexagonal/Classical Tiling for GPU

Time-tiling is necessary for e cient execution of iterative stencil computations. But the usual hyper-rectangular tiles cannot be used because of positive/negative dependence distances along the stencil's spatial dimensions. Several prior e orts have addressed this issue. However, known techniques trade enhanced data reuse for other causes of ine ciency, such as unbalanced parallelism, redundan...

متن کامل

Optimal Tiling

Iteration space tiling is a common strategy used by parallelizing compilers to reduce communication overhead. We address the problem of determining the optimal tile size (which minimizes the total execution time of the program), for a particular program schema. We use a realistic model of the architecture which accounts for coprocessors that permit overlapping of communication and computation, ...

متن کامل

Repetitive Model Refactoring for Design Space Exploration of Intensive Signal Processing Applications

The efficient design of computation intensive multidimensional signal processing application requires to deal with three kinds of constraints: those implied by the data dependencies, the non functional requirements (real-time, power consumption) and the availability of resources of the execution platform. We propose here a strategy to use a refactoring tool dedicated to this kind of application...

متن کامل

فایل کامل مجلّه مطالعات زبان فرانسه دو فصلنامه علمی پژوهشی زبان فرانسه دانشکده زبانهای خارجی دانشگاه اصفهان

Tâ ÇÉÅ wx W|xâ Revue des Études de la Langue Française Revue semestrielle de la Faculté des Langues Étrangères de l'Université d'Ispahan Cinquième année, N° 8 Printemps-Eté 2013, ISSN 2008- 6571 ISSN électronique 2322-469X Cette revue est indexée dans: Ulrichsweb: global serials directory http://ulrichsweb.serialssolutions.com Doaj: Directory of Open Access Journals http://www.doaj.org ...

متن کامل

Java Bytecode Compression for Embedded Systems

A program executing on an embedded system or similar environment faces limited memory resources and xed time constrains. We demonstrate how factorization of common instruction sequences can be automatically applied to Java bytecode programs. Based on a series of experiments, we argue that program size is reduced by 30% on the average, typically with an execution time penalty of less than 30%. T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013